{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 四格表资料的 Fisher 确切概率法\n",
    "\n",
    "当四个表资料中，$ n < 40 $ 或 $ T < 1 $ 时，需要用四个表资料的 Fisher 确切概率法。\n",
    "\n",
    "## 案例\n",
    "\n",
    "探究乙肝免疫球蛋白预防家兔宫内感染 HBV 的效果。数据如下："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (2, 3)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>group</th><th>positive</th><th>negative</th></tr><tr><td>str</td><td>i64</td><td>i64</td></tr></thead><tbody><tr><td>&quot;control&quot;</td><td>7</td><td>2</td></tr><tr><td>&quot;experimental&quot;</td><td>2</td><td>6</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (2, 3)\n",
       "┌──────────────┬──────────┬──────────┐\n",
       "│ group        ┆ positive ┆ negative │\n",
       "│ ---          ┆ ---      ┆ ---      │\n",
       "│ str          ┆ i64      ┆ i64      │\n",
       "╞══════════════╪══════════╪══════════╡\n",
       "│ control      ┆ 7        ┆ 2        │\n",
       "│ experimental ┆ 2        ┆ 6        │\n",
       "└──────────────┴──────────┴──────────┘"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import polars as pl\n",
    "\n",
    "df = pl.read_csv(\"B_09_3-data.csv\")\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "理论频数应为"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (2, 3)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>group</th><th>positive</th><th>negative</th></tr><tr><td>str</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>&quot;control&quot;</td><td>4.764706</td><td>4.235294</td></tr><tr><td>&quot;experimental&quot;</td><td>4.235294</td><td>3.764706</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (2, 3)\n",
       "┌──────────────┬──────────┬──────────┐\n",
       "│ group        ┆ positive ┆ negative │\n",
       "│ ---          ┆ ---      ┆ ---      │\n",
       "│ str          ┆ f64      ┆ f64      │\n",
       "╞══════════════╪══════════╪══════════╡\n",
       "│ control      ┆ 4.764706 ┆ 4.235294 │\n",
       "│ experimental ┆ 4.235294 ┆ 3.764706 │\n",
       "└──────────────┴──────────┴──────────┘"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from scipy.stats import chi2_contingency\n",
    "\n",
    "res = chi2_contingency(df.select(\"positive\", \"negative\"))\n",
    "\n",
    "expected_freq = df.select(\"group\").with_columns(pl.DataFrame(res.expected_freq, schema=[\"positive\",\"negative\"]))\n",
    "\n",
    "expected_freq"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "样本总数 (n) 为："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "17"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "int(df.select(\"positive\", \"negative\").to_numpy().sum())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可见，$ n < 40 $。应采用 Fisher 确切法计算 $ \\chi^2 $"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "prior odds ratio = 10.5\n",
      "p value = 0.05668449197860963\n"
     ]
    }
   ],
   "source": [
    "from scipy.stats import fisher_exact\n",
    "\n",
    "res = fisher_exact(df.select(\"positive\", \"negative\"))\n",
    "\n",
    "print(f\"prior odds ratio = {res.statistic}\")\n",
    "print(f\"p value = {res.pvalue}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "P = 0.057 > 0.05, 因而没有 95% 的把我认为该免疫球蛋白对预防家兔宫内感染有显著效果。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
